Automatically Controlled-Vocabulary Indexing for Text Retrieval

نویسندگان

  • Kuang-hua Chen
  • Chien-Tin Wu
چکیده

The IR society has made efforts in free-term indexing for a long time. By contrast, few efforts are made in controlled-vocabulary indexing. A new model for controlled-vocabulary indexing is proposed in this paper. This proposed model, TF×OSDF×CSIDF, distinguishes subjectspecific words from common words and domain-specific words in documents. 60,400 MEDLINE records are used as training data and testing data and 100 MeSH subject headings are used as the testing controlled vocabularies. The preliminary experiments show good results. The precision and the recall concurrently exceed 90% using abstracts as training materials. The precision reaches 90% and the recall still keeps at 70% using title only. The problem of indexer’s consistency could be alleviated using the proposed model to automatically generate

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of User Image descriptions and Automatic Image Indexing Vocabularies: An Exploratory Study

This study explores the terms assigned by users to index, manage, and describe images and compares them to indexing terms derived automatically by systems for image retrieval. Results of this study indicate that userderived indexing vocabulary largely reflects what users see in the image or what they perceive as the overall topic of an image. This is in contrast to system-derived indexing where...

متن کامل

Automatic Thesaurus Extraction for Thai Text Retrieval Enhancement

Thesaurus is one of the most important components for information retrieval (IR) systems. A thesaurus provides a precise and controlled vocabulary that serves to coordinate document indexing and retrieval then it improves the retrieval effectiveness. However the major problem with the manual thesaurus is a laborintensive task and therefore also expensive to build and hard to update in timely ma...

متن کامل

An experiment in software component retrieval

Our research centers around exploring methodologies for developing reusable software, and developing methods and tools for building inter-enterprise information systems with reusable components. In this paper, we focus on an experiment in which different component indexing and retrieval methods were tested. The results are surprising. Earlier work had often shown that controlled vocabulary inde...

متن کامل

Bibliographic database access using free-text and controlled vocabulary: an evaluation

This paper evaluates and compares the retrieval effectiveness of various search models, based on either automatic text-word indexing or on manually assigned controlled descriptors. Retrieval is from a relatively large collection of bibliographic material written in French. Moreover, for this French collection we evaluate improvements that result from combining automatic and manual indexing. Fir...

متن کامل

Bilingual Indexing for Information Retrieval with AUTINDEX

AUTINDEX is a bilingual automatic indexing system for the two languages German and English. It is being developed within the EU-funded BINDEX project. The aim of the system is to automatically index large quantities of abstracts of scientific and technical papers from several areas of engineering. Automatic indexing takes place using a controlled vocabulary provided in monolingual and bilingual...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999